A Framework for Semi-Automatic Development of Rule-based Information Extraction Applications

نویسندگان

Peter Klügl

Martin Atzmüller

Tobias Hermann

Frank Puppe

چکیده

For the successful processing and handling of (large scale) document collections, effective information extraction methods are essential. This paper presents a framework for the semiautomatic development of rule-based information extraction applications based on the TEXTMARKER language utilizing machine learning methods. We describe the approach in detail and present the TEXTRULER system as an implementation of the proposed approach.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Development of an Automatic Land Use Extraction System in Urban Areas using VHR Aerial Imagery and GIS Vector Data

Lack of detailed land use (LU) information and efficient data collection methods have made the modeling of urban systems difficult. This study aims to develop a novel hierarchical rule-based LU extraction framework using geographic vector and remotely sensed (RS) data, in order to extract detailed subzonal LU information, residential LU in this study. The LU extraction system is developed to ex...

متن کامل

Rule-Based Information Extraction for Structured Data Acquisition using TextMarker

Information extraction is concerned with the location of specific items in (unstructured) textual documents, e.g., being applied for the acquisition of structured data. Then, the acquired data can be applied for mining methods requiring structured input data, in contrast to other text mining methods that utilize a bag-of-words approach. This paper presents a semi-automatic approach for structur...

متن کامل

A Domain-Independent Approach to IE Rule Development

A key element for the extraction of information in a natural language document is a set of shallow text analysis rules, which are typically based on pre-defined linguistic patterns. Current Information Extraction research aims at the automatic or semi-automatic acquisition of these rules. Within this research framework, we consider in this paper the potential for acquiring generic extraction pa...

متن کامل

UIMA Ruta: Rapid development of rule-based information extraction applications

Rule-based information extraction is an important approach for processing the increasingly available amount of unstructured data. The manual creation of rule-based applications is a time-consuming and tedious task, which requires qualified knowledge engineers. The costs of this process can be reduced by providing a suitable rule language and extensive tooling support. This paper presents UIMA R...

متن کامل

Modal Keywords, Ontologies, and Reasoning for Video Understanding

We proposed a novel framework for video content understanding that uses rules constructed from knowledge bases and multimedia ontologies. Our framework consists of an expert system that uses a rule-based engine, domain knowledge, visual detectors (for objects and scenes), and metadata (text from automatic speech recognition, related text, etc.). We introduce the idea of modal keywords, which ar...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2009

A Framework for Semi-Automatic Development of Rule-based Information Extraction Applications

نویسندگان

چکیده

منابع مشابه

Development of an Automatic Land Use Extraction System in Urban Areas using VHR Aerial Imagery and GIS Vector Data

Rule-Based Information Extraction for Structured Data Acquisition using TextMarker

A Domain-Independent Approach to IE Rule Development

UIMA Ruta: Rapid development of rule-based information extraction applications

Modal Keywords, Ontologies, and Reasoning for Video Understanding

عنوان ژورنال:

اشتراک گذاری